library(ggplot2)
library(dplyr)

Our purpose is to take an excel file of the housing price index (hpi) for the 50 states since 1975 and get into a format that can be used to visualize the data using ggplot.

For ggplot we’ll need to get this excel data into a data.frame or tibble with an appropriate date column.

# After import, we'll have a data.frame object
# that still needs work - we'll wrangle it into shape.
# Note that FMAC updates this data every few months, and 
# the script might need to updated accordingly. In particular, FMAC might add more metadata somewhere.
library(readxl)
url <- "http://www.freddiemac.com/research/docs/states.xls"
destfile <- "states.xls"
curl::curl_download(url, destfile)
states_hpi <- read_excel(destfile, col_types = c("text", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric"), skip = 5)

Take a look at the states_hpi object.

First 10 rows:

head(states_hpi) 

Last 16 rows:

tail(states_hpi, 16)
  1. There are 16 rows of metadata at the bottom.
  2. Remove those with slice
  3. The date column is called “month”.
  4. It’s terribly, unusably formatted - what does 1975M01 mean?
  5. We’ll use mutate to create a ‘date’ column.
  6. We’ll use select to delete the poorly formatted ‘month’ column and move Date to be first
  7. Now we can move from wide to long format so that each variable has it’s own column. This is what ggplot wants.
  8. We’ll use gather for this.
  9. Let’s round off hpi using the round function. Substantively, use mutate and create a new variable called hpi to replace the old.
  10. Then it’s time to start doing some calculations. We are already have the monthly housing price index. But we want to add some of our own calculations:

    • monthly apprecation (annualized) = hpa = ((1+ (hpi-lag(hpi))/hpi)^12) - 1
    • annual apprecation = ((1+ (hpi-lag(hpi, 12))/hpi)^1) - 1
    • quarterly (annualized) = ((1+ (hpi-lag(hpi, 3))/hpi)^4) - 1
    • rolling 12 month max hpi = rollapply(hpi, 12, min, fill = NA, na.rm = FALSE, align = 'right')
    • rolling 12 month min hpi = rollapply(hpi, 12, max, fill = NA, na.rm = FALSE, align = 'right')
  11. Not necessary but what if we wanted to parse the date string. We can use separate to extract the year and month: separate(date, into = c("year", "month"), sep = '-', convert = TRUE, remove = FALSE)

states_wrangled_ggplot <- 
  states_hpi %>% 
  # remove confusing metadata
  # This next line might need to be updated if the format changes - what if FMAC decide to include
  # three more lines of metadata that we don't want.
  slice(-511:-526) %>% 
  # Add a better formatted/titled Date column
  mutate(Date = seq(as.Date('1975/01/31'), by = "month", length.out = nrow(.))) %>% 
  # remove the month column, move date column to be first
  select(Date, everything(), -Month) %>% 
  #gather from wide to long format
  gather(state, hpi, -Date) %>%
  #round hpi to four digits
  mutate(hpi = round(hpi, digits = 4)) %>%
  #create new column: hpa, hpa12, hpa3 etc.
  mutate(
         hpa = ((1+ (hpi-lag(hpi))/hpi)^12) - 1, 
         hpa12 = ((1+ (hpi-lag(hpi, 12))/hpi)^1) - 1, 
         hpa3 = ((1+ (hpi-lag(hpi, 3))/hpi)^4) - 1,
         hpi12min = rollapply(hpi, 12, min, fill = NA, na.rm = FALSE, align = 'right'),
         hpi12max = rollapply(hpi, 12, max, fill = NA, na.rm = FALSE, align = 'right')) %>% 
  ##create a column called year and month by separating the date column
  separate(Date, into = c("year", "month"), sep = '-', convert = TRUE, remove = FALSE)
head(states_wrangled_ggplot)

Take a look now. There are new columns and it’s in long format.

Let’s head to ggplot and visualize.

## Work with data from 1990-2010
states_1990_2010 <- 
  states_wrangled_ggplot %>%
  filter(Date >= "1990-01-01" & Date <= "2010-01-01")
# Put the states_1990_2010 object into the a ggplot
hpa_ggplot <- 
  # This is a time series, so the x-axis is the 'Date'
  ggplot(states_1990_2010, aes(Date)) +
  # Drop in whatever date to be charted on y-axis against time on x-axis.
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  theme_minimal() +
  geom_line(aes(y = hpi12min), colour = "blue") +
  ylab("Housing Price Index")
hpa_ggplot

ggplot works in layers and we can keep adding layers to that object.

  # can add another time series or y-axis value
hpa_ggplot <- hpa_ggplot + 
  geom_line(aes(y=hpi), colour = "red")
hpa_ggplot

We can facet out by state instead of putting all the states on one graph.

hpa_ggplot_faceted <- 
  hpa_ggplot +
  facet_wrap(~state, ncol = 10)
hpa_ggplot_faceted

Let’s add a green point at a specific date.

# Add points
hpa_ggplot_faceted <- hpa_ggplot_faceted + 
  geom_point(data = subset(states_1990_2010, Date == "2000-01-31"), 
             aes(x = Date, y = hpi), color = "green", alpha = 0.7)
hpa_ggplot_faceted

Plotly can make our ggplots a bit more interactive. After we call ggplotly try mousing over the graph.

library(plotly)
ggplotly(hpa_ggplot)
LS0tCnRpdGxlOiAiSFBJIERhdGE6IGV4Y2VsIHRvIGdncGxvdCB2aWEgZHBseXIiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCmBgYHtyIHNldHVwLCBtZXNzYWdlID0gRkFMU0UsIHdhcm5pbmcgPSBGQUxTRX0KbGlicmFyeShnZ3Bsb3QyKQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KHRpZHlyKQpsaWJyYXJ5KHpvbykKYGBgCgpPdXIgcHVycG9zZSBpcyB0byB0YWtlIGFuIGV4Y2VsIGZpbGUgb2YgdGhlIGhvdXNpbmcgcHJpY2UgaW5kZXggKGhwaSkgZm9yIHRoZSA1MCBzdGF0ZXMgc2luY2UgMTk3NSBhbmQgZ2V0IGludG8gYSBmb3JtYXQgdGhhdCBjYW4gYmUgdXNlZCB0byB2aXN1YWxpemUgdGhlIGRhdGEgdXNpbmcgZ2dwbG90LgoKRm9yIGBnZ3Bsb3RgIHdlJ2xsIG5lZWQgdG8gZ2V0IHRoaXMgZXhjZWwgZGF0YSBpbnRvIGEgZGF0YS5mcmFtZSBvciB0aWJibGUgd2l0aCBhbiBhcHByb3ByaWF0ZSBkYXRlIGNvbHVtbi4gCgoKYGBge3IsIG1lc3NhZ2UgPSBGQUxTRSwgd2FybmluZyA9IEZBTFNFfQoKIyBBZnRlciBpbXBvcnQsIHdlJ2xsIGhhdmUgYSBkYXRhLmZyYW1lIG9iamVjdAojIHRoYXQgc3RpbGwgbmVlZHMgd29yayAtIHdlJ2xsIHdyYW5nbGUgaXQgaW50byBzaGFwZS4KIyBOb3RlIHRoYXQgRk1BQyB1cGRhdGVzIHRoaXMgZGF0YSBldmVyeSBmZXcgbW9udGhzLCBhbmQgCiMgdGhlIHNjcmlwdCBtaWdodCBuZWVkIHRvIHVwZGF0ZWQgYWNjb3JkaW5nbHkuIEluIHBhcnRpY3VsYXIsIEZNQUMgbWlnaHQgYWRkIG1vcmUgbWV0YWRhdGEgc29tZXdoZXJlLgpsaWJyYXJ5KHJlYWR4bCkKdXJsIDwtICJodHRwOi8vd3d3LmZyZWRkaWVtYWMuY29tL3Jlc2VhcmNoL2RvY3Mvc3RhdGVzLnhscyIKZGVzdGZpbGUgPC0gInN0YXRlcy54bHMiCmN1cmw6OmN1cmxfZG93bmxvYWQodXJsLCBkZXN0ZmlsZSkKc3RhdGVzX2hwaSA8LSByZWFkX2V4Y2VsKGRlc3RmaWxlLCBjb2xfdHlwZXMgPSBjKCJ0ZXh0IiwgCiAgICAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsIAogICAgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAKICAgICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgCiAgICAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsIAogICAgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAKICAgICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgCiAgICAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsIAogICAgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAKICAgICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgCiAgICAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsIAogICAgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAKICAgICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsICJudW1lcmljIiwgCiAgICAibnVtZXJpYyIsICJudW1lcmljIiwgIm51bWVyaWMiLCAibnVtZXJpYyIsIAogICAgIm51bWVyaWMiKSwgc2tpcCA9IDUpCmBgYAoKVGFrZSBhIGxvb2sgYXQgdGhlIHN0YXRlc19ocGkgb2JqZWN0LiAKCkZpcnN0IDEwIHJvd3M6CgpgYGB7cn0KaGVhZChzdGF0ZXNfaHBpKSAKYGBgCgpMYXN0IDE2IHJvd3M6IAoKYGBge3J9CnRhaWwoc3RhdGVzX2hwaSwgMTYpCmBgYAoKMS4gVGhlcmUgYXJlIDE2IHJvd3Mgb2YgbWV0YWRhdGEgYXQgdGhlIGJvdHRvbS4gCjIuIFJlbW92ZSB0aG9zZSB3aXRoIGBzbGljZWAKMy4gVGhlIGRhdGUgY29sdW1uIGlzIGNhbGxlZCAibW9udGgiLgo0LiBJdCdzIHRlcnJpYmx5LCB1bnVzYWJseSBmb3JtYXR0ZWQgLSB3aGF0IGRvZXMgMTk3NU0wMSBtZWFuPwo1LiBXZSdsbCB1c2UgYG11dGF0ZWAgdG8gY3JlYXRlIGEgJ2RhdGUnIGNvbHVtbi4gCjYuIFdlJ2xsIHVzZSBgc2VsZWN0YCB0byBkZWxldGUgdGhlIHBvb3JseSBmb3JtYXR0ZWQgJ21vbnRoJyBjb2x1bW4gYW5kIG1vdmUgRGF0ZSB0byBiZSBmaXJzdCAKNy4gTm93IHdlIGNhbiBtb3ZlIGZyb20gd2lkZSB0byBsb25nIGZvcm1hdCBzbyB0aGF0IGVhY2ggdmFyaWFibGUgaGFzIGl0J3Mgb3duIGNvbHVtbi4gVGhpcyBpcyB3aGF0IGdncGxvdCB3YW50cy4gCjguIFdlJ2xsIHVzZSBgZ2F0aGVyYCBmb3IgdGhpcy4gCjkuIExldCdzIHJvdW5kIG9mZiBgaHBpYCB1c2luZyB0aGUgYHJvdW5kYCBmdW5jdGlvbi4gU3Vic3RhbnRpdmVseSwgdXNlIGBtdXRhdGVgIGFuZCBjcmVhdGUgYSBuZXcgdmFyaWFibGUgY2FsbGVkIGBocGlgIHRvIHJlcGxhY2UgdGhlIG9sZC4gCjEwLiBUaGVuIGl0J3MgdGltZSB0byBzdGFydCBkb2luZyBzb21lIGNhbGN1bGF0aW9ucy4gV2UgYXJlIGFscmVhZHkgaGF2ZSB0aGUgbW9udGhseSBob3VzaW5nIHByaWNlIGluZGV4LiBCdXQgd2Ugd2FudCB0byBhZGQgc29tZSBvZiBvdXIgb3duIGNhbGN1bGF0aW9uczogCgogICAgKyBtb250aGx5IGFwcHJlY2F0aW9uIChhbm51YWxpemVkKSA9IGBocGEgPSAoKDErIChocGktbGFnKGhwaSkpL2hwaSleMTIpIC0gMWAKICAgICsgYW5udWFsIGFwcHJlY2F0aW9uID0gYCgoMSsgKGhwaS1sYWcoaHBpLCAxMikpL2hwaSleMSkgLSAxYAogICAgKyBxdWFydGVybHkgKGFubnVhbGl6ZWQpID0gYCgoMSsgKGhwaS1sYWcoaHBpLCAzKSkvaHBpKV40KSAtIDFgCiAgICArIHJvbGxpbmcgMTIgbW9udGggbWF4IGhwaSA9IGByb2xsYXBwbHkoaHBpLCAxMiwgbWluLCBmaWxsID0gTkEsIG5hLnJtID0gRkFMU0UsIGFsaWduID0gJ3JpZ2h0JylgCiAgICArIHJvbGxpbmcgMTIgbW9udGggbWluIGhwaSA9IGByb2xsYXBwbHkoaHBpLCAxMiwgbWF4LCBmaWxsID0gTkEsIG5hLnJtID0gRkFMU0UsIGFsaWduID0gJ3JpZ2h0JylgCgoxMC4gTm90IG5lY2Vzc2FyeSBidXQgd2hhdCBpZiB3ZSB3YW50ZWQgdG8gcGFyc2UgdGhlIGRhdGUgc3RyaW5nLiBXZSBjYW4gdXNlIGBzZXBhcmF0ZWAgdG8gZXh0cmFjdCB0aGUgeWVhciBhbmQgbW9udGg6IGBzZXBhcmF0ZShkYXRlLCBpbnRvID0gYygieWVhciIsICJtb250aCIpLCBzZXAgPSAnLScsIGNvbnZlcnQgPSBUUlVFLCByZW1vdmUgPSBGQUxTRSlgCgoKYGBge3IsIG1lc3NhZ2UgPSBGQUxTRSwgd2FybmluZyA9IEZBTFNFfQoKCnN0YXRlc193cmFuZ2xlZF9nZ3Bsb3QgPC0gCiAgc3RhdGVzX2hwaSAlPiUgCiAgIyByZW1vdmUgY29uZnVzaW5nIG1ldGFkYXRhCiAgIyBUaGlzIG5leHQgbGluZSBtaWdodCBuZWVkIHRvIGJlIHVwZGF0ZWQgaWYgdGhlIGZvcm1hdCBjaGFuZ2VzIC0gd2hhdCBpZiBGTUFDIGRlY2lkZSB0byBpbmNsdWRlCiAgIyB0aHJlZSBtb3JlIGxpbmVzIG9mIG1ldGFkYXRhIHRoYXQgd2UgZG9uJ3Qgd2FudC4KICBzbGljZSgtNTExOi01MjYpICU+JSAKICAjIEFkZCBhIGJldHRlciBmb3JtYXR0ZWQvdGl0bGVkIERhdGUgY29sdW1uCiAgbXV0YXRlKERhdGUgPSBzZXEoYXMuRGF0ZSgnMTk3NS8wMS8zMScpLCBieSA9ICJtb250aCIsIGxlbmd0aC5vdXQgPSBucm93KC4pKSkgJT4lIAogICMgcmVtb3ZlIHRoZSBtb250aCBjb2x1bW4sIG1vdmUgZGF0ZSBjb2x1bW4gdG8gYmUgZmlyc3QKICBzZWxlY3QoRGF0ZSwgZXZlcnl0aGluZygpLCAtTW9udGgpICU+JSAKICAjZ2F0aGVyIGZyb20gd2lkZSB0byBsb25nIGZvcm1hdAogIGdhdGhlcihzdGF0ZSwgaHBpLCAtRGF0ZSkgJT4lCiAgI3JvdW5kIGhwaSB0byBmb3VyIGRpZ2l0cwogIG11dGF0ZShocGkgPSByb3VuZChocGksIGRpZ2l0cyA9IDQpKSAlPiUKICAjY3JlYXRlIG5ldyBjb2x1bW46IGhwYSwgaHBhMTIsIGhwYTMgZXRjLgogIG11dGF0ZSgKICAgICAgICAgaHBhID0gKCgxKyAoaHBpLWxhZyhocGkpKS9ocGkpXjEyKSAtIDEsIAogICAgICAgICBocGExMiA9ICgoMSsgKGhwaS1sYWcoaHBpLCAxMikpL2hwaSleMSkgLSAxLCAKICAgICAgICAgaHBhMyA9ICgoMSsgKGhwaS1sYWcoaHBpLCAzKSkvaHBpKV40KSAtIDEsCiAgICAgICAgIGhwaTEybWluID0gcm9sbGFwcGx5KGhwaSwgMTIsIG1pbiwgZmlsbCA9IE5BLCBuYS5ybSA9IEZBTFNFLCBhbGlnbiA9ICdyaWdodCcpLAogICAgICAgICBocGkxMm1heCA9IHJvbGxhcHBseShocGksIDEyLCBtYXgsIGZpbGwgPSBOQSwgbmEucm0gPSBGQUxTRSwgYWxpZ24gPSAncmlnaHQnKSkgJT4lIAogICMjY3JlYXRlIGEgY29sdW1uIGNhbGxlZCB5ZWFyIGFuZCBtb250aCBieSBzZXBhcmF0aW5nIHRoZSBkYXRlIGNvbHVtbgogIHNlcGFyYXRlKERhdGUsIGludG8gPSBjKCJ5ZWFyIiwgIm1vbnRoIiksIHNlcCA9ICctJywgY29udmVydCA9IFRSVUUsIHJlbW92ZSA9IEZBTFNFKQoKaGVhZChzdGF0ZXNfd3JhbmdsZWRfZ2dwbG90KQpgYGAKCgpUYWtlIGEgbG9vayBub3cuIFRoZXJlIGFyZSBuZXcgY29sdW1ucyBhbmQgaXQncyBpbiBsb25nIGZvcm1hdC4gCgpMZXQncyBoZWFkIHRvIGBnZ3Bsb3RgIGFuZCB2aXN1YWxpemUuCgpgYGB7ciwgbWVzc2FnZSA9IEZBTFNFLCB3YXJuaW5nID0gRkFMU0V9CiMjIFdvcmsgd2l0aCBkYXRhIGZyb20gMTk5MC0yMDEwCnN0YXRlc18xOTkwXzIwMTAgPC0gCiAgc3RhdGVzX3dyYW5nbGVkX2dncGxvdCAlPiUKICBmaWx0ZXIoRGF0ZSA+PSAiMTk5MC0wMS0wMSIgJiBEYXRlIDw9ICIyMDEwLTAxLTAxIikKCiMgUHV0IHRoZSBzdGF0ZXNfMTk5MF8yMDEwIG9iamVjdCBpbnRvIHRoZSBhIGdncGxvdAoKaHBhX2dncGxvdCA8LSAKICAjIFRoaXMgaXMgYSB0aW1lIHNlcmllcywgc28gdGhlIHgtYXhpcyBpcyB0aGUgJ0RhdGUnCiAgZ2dwbG90KHN0YXRlc18xOTkwXzIwMTAsIGFlcyhEYXRlKSkgKwogICMgRHJvcCBpbiB3aGF0ZXZlciBkYXRlIHRvIGJlIGNoYXJ0ZWQgb24geS1heGlzIGFnYWluc3QgdGltZSBvbiB4LWF4aXMuCiAgdGhlbWUoYXhpcy50ZXh0LnggPSBlbGVtZW50X3RleHQoYW5nbGUgPSA5MCwgaGp1c3QgPSAxLCB2anVzdCA9IDAuNSkpICsKICB0aGVtZV9taW5pbWFsKCkgKwogIGdlb21fbGluZShhZXMoeSA9IGhwaTEybWluKSwgY29sb3VyID0gImJsdWUiKSArCiAgeWxhYigiSG91c2luZyBQcmljZSBJbmRleCIpCgpocGFfZ2dwbG90CmBgYAoKZ2dwbG90IHdvcmtzIGluIGxheWVycyBhbmQgd2UgY2FuIGtlZXAgYWRkaW5nIGxheWVycyB0byB0aGF0IG9iamVjdC4gCgpgYGB7cn0KICAjIGNhbiBhZGQgYW5vdGhlciB0aW1lIHNlcmllcyBvciB5LWF4aXMgdmFsdWUKaHBhX2dncGxvdCA8LSBocGFfZ2dwbG90ICsgCiAgZ2VvbV9saW5lKGFlcyh5PWhwaSksIGNvbG91ciA9ICJyZWQiKQoKaHBhX2dncGxvdAoKCmBgYAoKV2UgY2FuIGZhY2V0IG91dCBieSBzdGF0ZSBpbnN0ZWFkIG9mIHB1dHRpbmcgYWxsIHRoZSBzdGF0ZXMgb24gb25lIGdyYXBoLgoKYGBge3J9CmhwYV9nZ3Bsb3RfZmFjZXRlZCA8LSAKICBocGFfZ2dwbG90ICsKICBmYWNldF93cmFwKH5zdGF0ZSwgbmNvbCA9IDEwKQoKaHBhX2dncGxvdF9mYWNldGVkCmBgYAoKTGV0J3MgYWRkIGEgZ3JlZW4gcG9pbnQgYXQgYSBzcGVjaWZpYyBkYXRlLiAKCmBgYHtyfQojIEFkZCBwb2ludHMKaHBhX2dncGxvdF9mYWNldGVkIDwtIGhwYV9nZ3Bsb3RfZmFjZXRlZCArIAogIGdlb21fcG9pbnQoZGF0YSA9IHN1YnNldChzdGF0ZXNfMTk5MF8yMDEwLCBEYXRlID09ICIyMDAwLTAxLTMxIiksIAogICAgICAgICAgICAgYWVzKHggPSBEYXRlLCB5ID0gaHBpKSwgY29sb3IgPSAiZ3JlZW4iLCBhbHBoYSA9IDAuNykKCmhwYV9nZ3Bsb3RfZmFjZXRlZApgYGAKClBsb3RseSBjYW4gbWFrZSBvdXIgZ2dwbG90cyBhIGJpdCBtb3JlIGludGVyYWN0aXZlLiBBZnRlciB3ZSBjYWxsIGBnZ3Bsb3RseWAgdHJ5IG1vdXNpbmcgb3ZlciB0aGUgZ3JhcGguCgpgYGB7ciwgbWVzc2FnZSA9IEZBTFNFLCB3YXJuaW5nID0gRkFMU0V9CmxpYnJhcnkocGxvdGx5KQpnZ3Bsb3RseShocGFfZ2dwbG90KQpgYGAKCg==